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1  Summary  of  project  goals 


The  specific  challenge  tackled  in  this  project  is  detection,  localization  and  recovery  of  weak  and 
distributed  patterns  of  activation  in  a  network.  Weak  patterns  of  activation  in  a  network  arise  in 
myriad  problems  including  identification  of  incipient  contamination  or  seismic  activity  monitored 
by  a  sensor  network,  onset  of  a  virus  in  the  Internet,  covert  signals  in  communication  networks, 
or  anomalous  social  activity.  Moreover,  the  distributed  nature  of  these  patterns  implies  that  they 
are  undetectable  in  local  signatures  of  individual  nodes,  as  well  as  in  network-wide  aggregates. 
As  a  result,  the  solution  to  this  problem  hinges  on  the  development  of  novel  data  fusion  methods 
that  leverage  the  structure  of  the  underlying  network.  Since  the  number  of  possible  activation 
patterns  can  grow  exponentially  with  network  size,  conventional  estimators  and  detectors  such 
as  scan  statistic  or  generalized  likelihood  ratio  that  scan  over  all  patterns  are  computationally 
intractable.  On  the  other  hand,  attempts  to  develop  feasible  detectors  such  as  fast  subset  scanning 
or  averaging/thresholding  require  high  Signal-to-Noise  Ratios  (SNRs).  Furthermore,  there  are 
constraints  on  resources  such  as  limits  on  storage,  sensing,  communication  energy  or  bandwidth. 

The  goals  of  this  project  were  to  address  the  following  problems: 

1.  Determine  theoretical  limits  of  detection,  localization  and  recovery  of  weak  distributed  acti¬ 
vations  in  large-scale  networked  systems. 

2.  Develop  practical  computationally  efficient  algorithms  that  require  minimal  SNR  and  mea¬ 
surement  resources  to  identify  weak  and  distributed  patterns  of  network  activity. 


2  Significant  work  accomplished 

This  section  summarizes  the  theory  and  methods  developed  in  this  project  for  the  problems  of 
detecting,  localizing  and  estimating  weak  and  distributed  graph-structured  patterns  under  1)  a 

direct  measurement  model  and  2)  a  compressive  and  adaptive  measurement  model. 

2.1  Direct  measurement  model 

Under  this  model,  the  observations  correspond  to  a  single  measurement  at  each  node  of  a  known 
network  graph  G  —  (V,  E),  i.e., 


Vi  =  Xi  +  6i  i  =  1, . . . , \V\ 

where  x\  is  the  true  underlying  activation  at  node  i  that  is  corrupted  by  additive  white  Gaussian 
noise  e{  A/"(0,  a2). 

I.  Detection:  The  goal  of  detection  is  to  distinguish  between  the  two  hypothesis: 

Ho  :  x  0 
H\  :  x  =  file 

Here  x  =  {xi}iey  and  C  G  Cc,p  :=  {C  C  V  :  \C\  —  c,  \dC\  <  p}  denotes  the  set  of  (possibly 
disconnected)  activated  vertices  with  size  \C\  =  c  and  cut-size  \dC\  :=  |(i,j)  G  E  :  i  G 
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C,  j  0  C |  less  than  or  equal  to  a  constant  p  >  0.  1  For  a  given  sparsity  level  c,  smaller 
values  of  p  imply  that  the  set  of  activated  nodes  are  localized  on  the  graph.  The  goal  is  to 
develop  computationally  efficient  detectors  that  can  distinguish  between  Hq  and  H\  at  very 
low  signal-to-noise  ratios  (SNRs)  p/a. 

The  Generalized  Likelihood  Ratio  Test  (GLRT)  statistic,  also  known  as  combinatorial  scan 
statistic,  for  this  hypothesis  testing  problem  is  given  as: 

max  lJy 

cecc,p 

While  the  GLRT  or  a  scan  over  an  e-net  of  the  class  CC:P  is  optimal  in  many  cases  [1,  2,  3], 
it  is  computationally  intractable.  While  there  has  been  some  work  on  developing  fast  graph 
subset  scanning  methods  [4],  these  greedy  methods  sacrifice  statistical  power.  This  project 
developed  detectors  for  weak  graph-structured  patterns  by  borrowing  tools  from  graph  theory, 
optimization  and  machine  learning.  These  detectors  are  computationally  efficient,  applicable 
to  graphs  and  patterns  with  general  structures  and  come  with  precise  theoretical  guarantees, 
often  achieving  near-optimal  statistical  performance. 

•  The  spectral  scan  statistic  (SSS)  developed  in  [5]  is  obtained  by  a  convex  spectral 
relaxation  of  the  combinatorial  scan  statistic,  inspired  by  the  relaxation  used  in  spectral 
clustering  algorithm  in  machine  learning.  This  involves  relaxing  the  cut  size  constraint 
using  the  graph  Laplacian  matrix  A  =  D—A  where  A  denotes  the  adjacency  matrix  of  the 
graph  and  D  is  a  diagonal  matrix  with  vertex  degrees  on  the  diagonal  i.e.  Da  — 

The  cut  size  can  be  written  as  \dC\  =  ljAlcs  suggesting  that  the  domain  of  the  GLRT 
can  be  relaxed  to  zTAz  where  z  E  relaxes  the  vector  1  c-  The  resulting  spectral 
scan  statistic  is  defined  as  follows  where  y  =  y  —  lTy/|R| 

s  =  sup  (zTy)2  s.t.  ztAz  <  p,  ||z||  <  l,zTl  =  0. 

zeMl^l 

As  shown  in  [5],  the  convex  spectral  scan  statistic  can  be  solved  efficiently  in  the  dual 
domain  by  first-order  interior  point  methods.  The  SNR  required  by  the  SSS  is  charac¬ 
terized  as  follows.  Here  a  —  oj(b)  denotes  that  a/b  — )►  oo. 

Theorem  1.  [5]  The  spectral  scan  statistic  asymptotically  distinguishes  Hq  from  Hi  if 


where  \  are  the  eigenvalues  of  the  graph  Laplacian  matrix  A  sorted  in  ascending  order. 

This  result  suggests  that  the  SNR  required  by  SSS  scales  with  the  complexity  of  the 
pattern  class  (cut-size  to  size  ratio  p/c,  or  equivalently  the  surface  to  volume  ratio, 
of  the  activated  vertices),  as  well  as  the  complexity  of  the  graph  (decay  of  Laplacian 
eigenvalues).  The  graph  spectrum  and  this  bound  is  evaluated  for  specific  low-cut  and 

1  Some  of  the  methods  developed  apply  to  more  general  composite  null  hypotheses  that  allow  for  piece- wise  constant 
activations,  but  for  simplicity  we  focus  on  this  setup  in  the  report. 
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sparse  patterns  on  specific  graphs  (e.g.  subtrees  of  activation  in  a  tree  graph,  squares  of 
activation  in  a  2-dimensional  torus  or  multi-resolution  groups  in  Kronecker  graphs)  in 
[5].  An  extension  of  this  work,  the  Graph  Ellipsoid  Scan  Statistic  (GESS),  was  recently 
developed  which  upper  and  lower  bounds  the  SSS,  and  enables  tighter  performance 
bounds.  Work  on  GESS  is  currently  in  preparation  for  submission  [6]. 

Some  information-theoretic  lower  bounds  for  this  problem  are  also  derived  in  [5,  7]  which 
reveal  that  while  the  SSS  and  GESS  are  nearly-optimal  for  non-sparse  activations  (large 
c),  their  performance  is  suboptimal  for  sparse  patterns,  except  for  very  specific  graphs. 
The  remaining  two  detectors  described  below  overcome  this  drawback  and  perform  better 
with  a  small  set  of  activated  vertices. 


The  Lovasz  extended  scan  statistic  (LESS)  [8]  is  another  relaxation  of  the  GLRT 
obtained  as  follows.  The  GLRT  can  be  written  in  terms  of  the  binary  vector  z  =  1q  G 

{0, 1}VI  as 


z  y  , 

max  — —  s.t. 

zfejo.i}  '  i  Vc 


Y  Hzi  +  zj\  <  PAT z  =  c 

(iJ)eE 


Submodularity  is  the  combinatorial  analogue  of  convexity,  and  it  turns  out  that  the 
cut  size  (\dC\)  is  submodular.  For  every  submodular  function  there  exists  a  convex 
relaxation,  called  the  Lovasz  extension.  The  Lovasz  extension  of  \dC\  =  Y2(ij)eE^izi  ^ 
Zj}  is  the  total  variation  J2(ij)eE  \zi  ~  zj I-  Thus,  it  is  natural  to  relax  the  GLRT  as 
follows 


l  =  max 
ze[o,i]TS 


zTy 


s.t. 


Y  \Zi-Zj\<  P,lTz  =  c 
(iJ)eE 


(1) 


which  is  called  the  LESS.  In  [8],  convex  analysis  has  been  used  to  derive  the  dual  program 
to  the  LESS,  and  it  is  shown  that  LESS  can  be  solved  efficiently  using  methods  for  finding 
graph  cuts.  The  SNR  required  by  LESS  depends  on  rmax  the  maximum  effective  resis¬ 
tance  of  the  graph  cut  induced  by  a  pattern  in  CC:P .  Formally,  rmax  =  ma xceCc,P  J2eedC  re 
where  re  is  the  effective  resistance  of  the  edge  e. 

Theorem  2.  [8]  The  Lovasz  extended  scan  statistic  asymptotically  distinguishes  Hq  from 
#  i  if 

P  —  u  ^ ymax(rmax,log(V|))log(V| ) 


By  Foster’s  theorem,  the  effective  resistance  of  a  cut  is  «  p/d  where  d  is  the  average 
degree  of  a  vertex.  This  intuition  can  be  formalized  for  specific  graphs  such  as  edge 
transitive  graphs  (including  the  lattice  and  complete  graphs)  and  random  geometric 
graphs  (such  as  fc-nearest  neighbor  and  e- nearest  neighbor  graphs).  For  these  cases, 
a  comparison  with  information-theoretic  lower  bounds  suggests  that  LESS  is  nearly 
optimal  2.  If  rmax  ^  /)/(!  C  c,  the  active  nodes  are  localized  on  the  graph  and  the 
detector  takes  advantage  of  structured  sparsity.  On  the  other  hand,  if  rmax  «  p/d  «  c 
the  pattern  is  not  localized  and  the  SNR  requirement  degrades  gracefully  to  y^log  \  V\ 
(up  to  log  factors),  which  is  characteristic  of  unstructured  tests  (that  do  not  leverage 
knowledge  of  the  graph)  such  as  the  max  statistic  or  Higher  Criticism  [9] . 

2 The  necessary  SNR  for  sparse  patterns  essentially  scales  like  yj (p/c/maxc)  log  \  V\  [7] 
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Algorithm  1  FormWavelets 

Require:  S  =  {li}^ 

(1)  Let  71  =  Uj<|5|/271  and  75  =  Ui>|<S|/277. 

(2)  Form  the  following  basis  element  and  add  it  to  B: 


b  = 


VW+W\ 


J_ir _ L  ir 

LlTil  Tl  |75|  T2. 


(3)  Recurse  at  (1)  with  S  <—  {Ti}i<\s\/2  and  {"77}i>|^|/2  separately. 


•  The  graph  wavelet  statistic  [10,  7]  can  be  obtained  by  constructing  an  orthonormal 
wavelet  basis  B  =  [bi, . . . ,  b|y|]  for  the  graph  with  the  property  that  every  pattern  in 
Cc,p  has  a  sparse  representation  in  terms  of  the  basis  coefficients.  Projecting  the  node 
observations  onto  such  a  basis  would  concentrate  the  signal  energy  in  a  few  coefficients 
while  the  noise  distribution  remains  the  same,  thus  boosting  the  SNR.  This  leads  to 
natural  detectors  based  on  thresholding  the  maximum  wavelet  coefficient 

max  bTy 

beB 


which  is  equivalent  to  scanning  over  an  epsilon- net  of  Cc?p. 

For  hierarchically-structured  network  patterns  characterized  by  a  latent  tree  graph,  such 
an  orthonormal  unbalanced  Haar  wavelet  basis  was  developed  [10].  This  construction  was 
then  extended  to  low-cut  activation  patterns  on  general  graph  structures  by  leveraging 
the  spanning  tree  of  a  graph  to  correspond  to  the  latent  tree  [7].  Specifically,  for  general 
graphs,  the  graph  wavelet  construction  relies  on  the  uniform  spanning  tree  (UST)  which 
can  be  constructed  in  time  nearly  linear  in  the  number  of  vertices  for  most  graphs 
using  the  Aldous-Broder  algorithm  [11].  Given  a  UST,  the  wavelet  construction  iterates 
the  following  steps:  finding  a  balancing  vertex,  removing  it  from  the  uniform  spanning 
tree,  forming  a  basis  that  spans  the  resulting  connected  components,  and  recursing  on 
the  remaining  subtrees.  A  balancing  vertex  is  one  such  that  the  remaining  connected 
components,  after  its  removal  from  the  tree,  are  at  most  half  the  size  of  the  graph.  A 
simple  algorithm  that  travels  in  the  direction  of  the  largest  subtree  at  a  vertex  can  be 
used  to  find  this  in  nearly  0(|U|)  time.  The  wavelet  construction  is  summarized  in 
Algorithm  1,  which  takes  as  input  the  connected  subtrees  S  =  {71}^  after  the  removal 
of  the  balancing  vertex  u,  where  dv  is  the  degree  of  vertex  v.  The  SNR  required  by  the 
UST  wavelet  detector  is  given  as  follows. 

Theorem  3.  [7]  The  uniform  spanning  tree  wavelet  statistic  asymptotically  distinguishes 
H0  from  Hi  if 

/  ^  ^  ^  ^J’max  log(^  max  )log2(VI)j 

where  dm ax  is  the  maximum  degree  of  the  graph  G. 

This  performance  bound  is  similar  to  that  of  LESS,  and  similar  to  LESS  the  UST  wavelet 
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(a)  (b) 

Figure  1:  (a)  ROC  curves  for  spectral  scan  statistic  (SSS),  uniform  spanning  tree  wavelet  statistic 
(Wavelet),  the  maximum  statistic,  max^  |^|,  (Max),  and  Lovasz  extended  scan  statistic  (LESS).  The 
graphs  used  are  square  2D  Torus  (top),  and  e-NN  graph  (bottom)  with  e  ~  |E|-1/3;  with  (i  —  4,3 
respectively,  \V\  =  225,  and  c  ~  I'Ll1/2,  (b)  Comparison  of  wavelet  detector  with  maximum  and 
aggregate  statistic  on  a  torus  with  increasing  size  of  activated  cluster,  for  a  fixed  cut  size. 

detector  is  nearly  optimal  for  many  graphs  and  pattern  classes.  It  also  takes  advantage 
of  structured  sparsity  and  degrades  gracefully  for  unstructured  settings. 


A  comparison  of  the  three  detectors  will  appear  in  [12]  for  graph-structured  patterns  simulated 
over  a  2-dimensional  torus  and  e-NN  random  graph.  Fig.  1(a)  reports  the  true  positive  rate 
versus  the  false  positive  rate  as  the  threshold  varies  (also  known  as  the  receiver  operating 
curve  or  ROC.)  The  LESS  provides  a  tight  relaxation  and  hence  performs  better  than  SSS. 
The  wavelet  detector,  though  theoretically  optimal,  suffers  from  additional  log  factors  which 
make  its  performance  slightly  inferior  to  LESS.  For  each  graph,  all  of  the  developed  detectors 
dominate  the  max  statistic,  indicating  that  one  cannot  ignore  graph  structure  and  hope  to 
detect  at  optimal  SNRs. 

To  demonstrate  that  the  proposed  detectors  degrade  gracefully  when  the  cut  size  to  cluster 
size  becomes  large,  the  wavelet  detector  is  compared  to  two  unstructured  detectors  based 
on  the  maximum  and  global  average  of  all  observations.  The  global  aggregate  statistic  is 
expected  to  work  well  when  the  cluster  size  is  very  large.  Fig.  1(b)  shows  that,  for  a  fixed  cut 
size,  the  wavelet  detector  degrades  to  the  aggregate  and  maximum  tests  for  very  large  and 
very  small  cluster  sizes  respectively,  but  outperforms  them  when  the  pattern  is  localized  on 
the  graph  (not  globally  spread  or  too  sparse  such  that  graph  structure  cannot  be  leveraged). 


II.  Estimation:  The  goal  of  estimation  is  to  de-noise  the  node  observations  and  recover  the 
underlying  activation  pattern  x  accurately  in  mean-square-error  (MSE).  In  this  problem,  x 
does  not  necessarily  correspond  to  a  binary  activation.  Instead,  we  focus  on  the  class  of 
activations  that  are  smooth  with  respect  to  the  graph  G,  i.e.  if  two  nodes  are  connected 
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by  an  edge,  their  activations  are  similar.  This  can  be  formalized  by  considering  the  class  of 
patterns 

Xp  =  {x  :  xtAx  =  |  Xi  —  Xj  |  <  p} 

where  A  denotes  the  graph  Laplacian,  as  before.  Such  activation  patterns  (with  a  specific  p) 
also  arise  with  high  probability  when  sampled  from  a  Gaussian  Graphical  model  or  an  Ising 
model  [13]. 

Patterns  that  are  smooth  over  a  known  graph  can  be  denoised  by  projection  onto  the  Graph 
Laplacian  Eigenbasis.  Consider  the  spectral  decomposition  of  the  Graph  Laplacian  A  = 
UAUt,  and  denote  the  first  k  eigenvectors  (corresponding  to  the  smallest  eigenvalues)  of  A 
by  Define  the  estimator 

xfc  =  u[fc]uj]y 

which  is  a  hard  thresholding  of  the  projection  of  node  measurements  onto  the  graph  Laplacian 
eigenbasis.  This  estimator  reduces  to  some  well-known  estimators  for  specific  graphs,  e.g.  for 
regular  grids  aka  lattice  graphs,  the  Laplacian  eigenbasis  correspond  to  Fourier  basis  and  for 
hierarchical  graphs,  the  Laplacian  eigenbasis  correspond  to  the  Wavelet  basis.  The  following 
theorem  bounds  the  MSE  of  this  estimator. 

Theorem  4.  [13]  The  maximum  MSE  of  the  Projected  Graph  Laplacian  estimator  can  be 
bounded  as 

sup  E[||x  -  x||2]  <  min(|E|,  p/Xk+i)  +  ka2 

xeA'p 

where  \\  <  A2  <  . . .  are  the  ordered  eigenvalues  of  A. 

The  two  terms  in  the  bound  indicate  a  tradeoff  between  the  amount  of  signal  discarded  (first 
term)  and  the  amount  of  noise  retained  (second  term)  by  projecting  onto  the  first  k  Laplacian 
eigenbasis.  By  evaluating  the  eigenspectrum  of  various  graphs,  it  is  possible  to  establish  an 
appropriate  scaling  of  k  with  graph  size  \V\  and  the  amount  of  noise  that  can  be  tolerated 
while  ensuring  MSE  consistent  recovery  i.e.  MSE  0  as  the  graph  size  \V\  00.  For 

many  example  graphs,  it  is  observed  that  the  tolerable  noise  level  scales  as  a 2  =  o(p7),  where 
7  E  (0,1)  characterizes  the  strength  of  network  interactions  [13].  For  example,  for  lattice 
graphs  the  noise  tolerance  results  if  the  node  degrees  scale  as  7/  log  |  W|  (higher  7  implying 
more  neighbors  per  node),  for  hierarchical  graphs  this  requires  that  the  non-zero  interactions 
exist  until  level  7  log  |  W  |  going  bottom- up  (higher  7  implying  interactions  between  nodes 
at  coarse  scales  in  the  hierarchy),  and  for  Erdos- Renyi  graphs  the  noise  tolerance  results  if 
probability  of  an  edge  scales  as  (higher  7  implying  more  connectivity). 

III.  Localization:  The  goal  of  localization  is  to  identify  the  set  of  edges  across  which  the  true 
underlying  activation  differs,  i.e. 


DC  =  {( i,j )  e  E  :  Xij£  xj} 
based  on  noisy  observations  {yi}i=i- 


7 


This  problem  can  be  solved  via  the  “edge  lasso”  which  arises  as  a  special  case  of  the  generalized 
fused  lasso  optimization  as  described  in  literature  [14] 

min  -||y  —  x||2  +  A||Dx||i 

x  2 


where  the  matrix  D  E  rI^IxTI  specifies  the  constraints  imposed  by  the  graph  structure. 
Specifically,  each  row  of  the  matrix  D  corresponds  to  an  edge  (i,  j)  E  E  and  the  entries  are 
zero  except  for  a  +1  for  node  i  and  —1  for  node  j.  Thus,  the  optimization  seeks  to  find  a 
least  square  fit  to  the  noisy  observations  while  penalizing  the  I\  norm  of  the  differences  of 
measurements  across  edges  in  G.  This  project  investigated  conditions  under  which  the  edge 
lasso  is  sparsistent  i.e.  the  edges  over  which  x  differs  agree  exactly  with  the  edge  set  dC , 
asymptotically  for  large  graph  sizes  \V\  oc. 

Theorem  5.  [15]  Let  A  denote  the  maximally  connected  components  of  C .  For  each  A , 
consider  the  following  notion  of  degree  of  connectivity: 


p(A)  :=  max 


\8WndA\  \W\ 


WCA  \dwndw\  \A\ 

Also  let  dC  denote  the  set  of  edges  that  are  not  in  dC  and  denotes  the  pseudo-inverse  of 
the  graph  Laplacian.  If  for  each  A,  p(A)  =  o(l); 


»=jmmDm±yhrx 


a 


V  \A\ 


log(|3d) 


and  —  —  u 
a 


vm, 


then  the  edge  lasso  is  sparsistent. 

The  theorem  provides  general  conditions  for  the  success  of  edge  lasso. 
While  these  conditions  are  hard  to  comprehend  directly,  evaluating 
them  for  specific  graphs  provides  useful  insights.  As  shown  in  [15], 
for  1-d  and  2-d  lattice  graphs,  the  conditions  imply  that  edge  lasso 
succeeds  at  the  same  SNR  (up  to  log  factors)  as  thresholding  the 
difference  of  observations  at  nodes  connected  by  an  edge.  On  the 
other  hand,  for  more  structured  graphs  such  as  the  nested  complete 
graph  (c.f.  [15])  if  the  activated  vertices  have  low  connectivity  as 

per  p(A)  (e.g.  see  Fig.  2),  then  edge  lasso  can  localize  the  activated 
vertices  at  much  lower  SNR. 


Figure  2:  A  nested 
complete  graph 
with  a  low  con¬ 
nectivity  activated 
subgraph. 


2.2  Compressive  and  adaptive  measurement  model 

So  far  the  focus  has  been  on  the  direct  measurement  model.  This  project  also  explored  the  used 
of  compressive  and  adaptive  measurements  to  minimize  the  resource  budget  needed  for  detection 
and  localization  of  graph-structured  patterns.  Under  the  compressive  and  adaptive  measurement 
model,  each  observation  corresponds  to  a  (random/passive  or  sequentially  designed/active)  linear 
combination  of  the  node  measurements,  i.e. 

IJi  =  ajx  +  ei 

where  the  total  sensing  budget  ||a^||2  < 


i  =  1, . . . ,  m 


First,  the  specific  case  of  a  k\  x  &2  block  of  activation  in  a  n\  x  n 2  lattice  graph  structure  was 
considered,  i.e.  x  =  fi\c  where  C  corresponds  to  a  k±  x  &2  contiguous  block  [16].  The  precise 
tradeos  between  the  various  problem  parameters,  SNR  and  the  number  of  measurements  required 
to  reliably  detect  and  localize  the  block  of  activation  were  characterized.  The  sufficient  conditions 
are  complemented  with  information  theoretic  lower  bounds.  A  summary  of  known  results  for  the 
vector  case  and  results  of  this  project  for  the  block-structured  case  are  provided  in  Tables  1  and  2, 
respectively.  Contrary  to  results  in  compressed  sensing  of  sparse  vectors,  where  it  has  been  shown 
that  neither  adaptivity  nor  structure  help  reduce  the  SNR  or  number  of  measurements  needed 
[17,  18,  19,  20],  results  of  this  project  shows  that  for  reliable  localization  the  minimum  SNR 
needed  (or  equivalently  the  number  of  compressive  measurements  needed)  is  strongly  influenced  by 
both  structure  and  the  ability  to  choose  measurements  adaptively.  However,  for  detection  neither 
adaptivity  nor  structure  reduce  the  requirement  on  the  SNR. 

Table  1:  Known  results  for  a  fc-sparse  length  n  vector 


Detection 

Localization 

Passive 

M  x  /  n 
cr  V  mk 2 

/  n  log  n 

V  rn  ’ 

[21] 

m  y  k  log  n 

Active 

[17] 

M  > 
cr  ^ 

■"  flE 

V  m 

[18,  19,  20] 

Table  2:  Findings  for  a  k\  x  &2  block  of  activation  in  a  \V\  —  n\  x  712  lattice  [16] 


Detection 

Localization 

Passive 

M  ^  /  nin2 

cr  ^  Y  Tnmxn.{k\,k2) 

M  ^  /  riin2 

cr  Y 

Active 

^max  Y^, 

These  scalings  are  verified  in  Figure  ??  where  plotting  the  probability  of  successful  localization 
vs.  SNRs  rescaled  with  predicted  scaling,  aligns  all  the  curves. 


Figure  3:  Probability  of  successful  localization  of  a  sparse  square  block  of  activation  in  a  square 
lattice  vs.  SNRs  rescaled  with  predicted  scaling,  for  100  passive  compressive  measurements  (left) 
and  500  adaptive  compressive  measurements  (right),  averaged  over  100  simulation  runs. 


9 


The  upper  bound  for  the  detection  problem  is  achieved  by  a  simple  detector  based  on  thresh¬ 
olding  the  average  of  measurements  obtained  using  passively  designed,  constant- valued  linear  mea¬ 
surements  with  (j )  =  l/y/nrna  for  all  i  and  j.  The  upper  bound  for  passive  localization  is 
obtained  using  a  procedure  that  searches  over  all  contiguous  blocks  of  size  k  1  x  k2  and  outputs  the 
one  minimizing  the  squared  error.  Finally,  the  upper  bound  for  active  localization  is  attained  by 
a  compressive  binary  search  procedure  on  a  collection  of  cyclically  shifted  non-overlapping  blocks 
that  partition  the  lattice  graph.  Details  of  the  procedure  are  available  in  [16]. 

While  sequentially  designed  adaptive  compressive  measurements  yield  improvements  for  the 
simple  case  of  a  block-structured  activation  in  a  lattice  graph,  it  wasn’t  clear  whether  similar  im¬ 
provements  hold  for  general  activation  patterns  and  graph  structures.  This  question  was  explored 
in  [22]  for  patterns  with  low  cut-sizes  on  general  graph  structures.  The  results  indicate  that  in 
general  no  significant  gains  over  unstructured  settings  are  possible  for  localizing  the  activated  ver¬ 
tices,  however  if  the  activation  pattern  coincides  with  a  dendrogram  over  the  graph,  then  the  graph 
structure  can  be  exploited  to  design  adaptive  compressive  measurements  that  yield  SNR  improve¬ 
ments,  or  equivalently  savings  in  the  number  of  measurements  needed.  Two  methods  are  proposed 
in  [22]  that  perform  modifications  of  a  compressive  binary  search  over  the  dendrogram.  Compar¬ 
ing  these  methods  to  sequentially  designed  compressed  sensing  algorithm  (SDC)  from  [23]  (which 
does  not  exploit  structure,  but  has  near-optimal  performance  for  localization  of  non-zero  entries  in 
unstructured  sparse  vectors)  indicates  the  importance  of  exploiting  structure  (see  Figure  4). 


Figure  4:  Localization  error  for  proposed  Algorithms  1,  2,  and  SDC  from  [23]  which  does  not  exploit 
structure.  When  the  activation  is  very  small  k  —  10  (left),  all  algorithms  perform  the  same,  but 
when  activation  size  A:  =  50,  exploiting  structure  leads  to  significant  improvement.  Here  G  is  a  512 
node  line  graph  and  p  —  2,  resulting  in  one  connected  activated  subgraph. 


3  Conclusion 

This  project  addressed  the  problems  of  detection,  localization  and  estimation  of  weak  and  dis¬ 
tributed  patterns  of  activation  in  a  large-scale  network  given  access  to  direct,  compressive  and 
adaptive  noisy  node  measurements.  Information-theoretic  limits  were  identified  for  these  prob¬ 
lems,  along  with  computationally  efficient  methods  that  nearly  achieve  the  limits,  for  general  graph 
structures  and  classes  of  activation  patterns.  Development  of  such  state-of-the-art  methods  that 
are  both  computationally  and  statistically  efficient  is  crucial  to  advance  AFOSR’s  ability  to  moni¬ 
tor,  understand  and  secure  modern  large-scale  networks.  The  methods  developed  leveraged  highly 
inter-disciplinary  tools,  and  resulted  in  publications  including  invited  papers  and  oral  presentations 
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at  NIPS,  AISTATS,  Asilomar  and  GlobalSIP,  some  of  the  most  prominent  conferences  in  machine 
learning,  statistics  and  signal  processing. 


4  People  involved  in  various  aspects  of  project 

Graduate  Students: 

•  James  Sharpnack  (PhD  student,  Machine  Learning  Department;  now  postdoc,  University  of 
California  -  San  Diego) 

•  Akshay  Krishnamurthy  (PhD  student,  Computer  Science  Department) 

Faculty  Collaborator: 

•  Alessandro  Rinaldo  (Assistant  Professor  (now  Associate),  Statistics  Department) 
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